Skip to content

Conversation

Nemo157
Copy link
Member

@Nemo157 Nemo157 commented Mar 6, 2021

Reduces the impact from failures like #1305, rather than continuing to build crates with the failing toolchain we will just stop building crates until an admin has time to investigate the cause.

Example log when this happens:

web_1         | 2021/03/06 10:38:14 [INFO] docs_rs::docbuilder::rustwide_builder: copying essential files for rustc 1.52.0-nightly (caca2121f 2021-03-05)
web_1         | 2021/03/06 10:38:14 [ERROR] docs_rs::docbuilder::queue: Updating toolchain failed, locking queue: couldn't copy '/opt/docsrs/rustwide/builds/essential-files-20210305-1.52.0-nightly-caca2121f/target/doc/FireSans-Medium.woff2' to '/tmp/essential-filessNMWC1/FireSans-Medium.woff2'
web_1         | 2021/03/06 10:38:14 [ERROR] docs_rs::build_queue: Failed to build package serde-1.0.123 from queue: couldn't copy '/opt/docsrs/rustwide/builds/essential-files-20210305-1.52.0-nightly-caca2121f/target/doc/FireSans-Medium.woff2' to '/tmp/essential-filessNMWC1/FireSans-Medium.woff2'
web_1         | Backtrace:    0: failure::backtrace::internal::InternalBacktrace::new
web_1         |    1: <failure::backtrace::Backtrace as core::default::Default>::default
web_1         |    2: rustwide::build::BuildBuilder::run
web_1         |    3: docs_rs::docbuilder::rustwide_builder::RustwideBuilder::add_essential_files
web_1         |    4: docs_rs::docbuilder::rustwide_builder::RustwideBuilder::update_toolchain
web_1         |    5: docs_rs::build_queue::BuildQueue::process_next_crate
web_1         |    6: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
web_1         |    7: docs_rs::utils::queue_builder::queue_builder
web_1         |    8: std::sys_common::backtrace::__rust_begin_short_backtrace
web_1         |    9: core::ops::function::FnOnce::call_once{{vtable.shim}}
web_1         |   10: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
web_1         |       <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
web_1         |       std::sys::unix::thread::Thread::new::thread_start
web_1         |              at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/sys/unix/thread.rs:87
web_1         |   11: start_thread
web_1         |   12: __clone
web_1         |
web_1         | 2021/03/06 10:38:14 [WARN] docs_rs::utils::queue_builder: Lock file exits, skipping building new crates

Reduces the impact from failures like rust-lang#1305, rather than continuing to
build crates with the failing toolchain we will just stop building
crates until an admin has time to investigate the cause.
Comment on lines +90 to +94
if let Err(err) = builder.update_toolchain() {
log::error!("Updating toolchain failed, locking queue: {}", err);
self.lock()?;
return Err(err);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than calling update_toolchain in more places, do you think it makes sense to change build_package to return an error enum so we can tell where the error came from? The current change seems racy: a new nightly could be published between calling update_toolchain here and calling it in build_package.

I'm also slightly concerned that this doesn't handle errors when an admin runs cratesfyi build crate, but I don't know a simple way to handle that, and it seems rare for that to be the first build with a new toolchain anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the approach I tried first, but I got stuck with working out how to make it work with failure, I could take another attempt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could you maybe add a new error type and try downcasting to that? Don't spend too much time on it :) this should work fine 99.9% of the time.

@jyn514 jyn514 added A-builds Area: Building the documentation for a crate S-waiting-on-author Status: This PR is incomplete or needs to address review comments labels Mar 6, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 4, 2021

I think this is fine to merge - only the queue needs to know what the error type is, and the edge case for it to be racy is less than a few seconds a day:

let mut conn = self.db.get()?;
if !self.should_build(&mut conn, name, version)? {
return Ok(false);
}

@jyn514 jyn514 merged commit 47ff0a3 into rust-lang:master Apr 4, 2021
@Nemo157 Nemo157 deleted the lock-queue-on-update-failure branch May 31, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builds Area: Building the documentation for a crate S-waiting-on-author Status: This PR is incomplete or needs to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants